Module 01 · Foundations

What is AutoGen?

Microsoft Research's framework for building multi-agent AI systems that can reason, collaborate, and execute code.

AutoGen is an open-source framework from Microsoft Research that lets you build systems where multiple AI agents work together to solve complex tasks. Think of it as a runtime for AI teamwork — agents can talk to each other, use tools, write and execute code, and ask humans for input.

Unlike workflow tools (n8n, Zapier) that execute deterministic steps, AutoGen agents reason autonomously. They decide how to solve a problem, not just follow a pre-written path.

Why does AutoGen exist?

🧩

Single LLM Limits

One LLM has a limited context window and can hallucinate. Multiple agents can verify each other, split tasks, and specialize.

🔁

Iterative Refinement

Agents can critique and improve each other's outputs — a critic agent reviewing a writer agent's code, for instance.

🛠️

Tool Execution

Agents can write Python, run it in a sandbox, see the result, fix bugs — closing the loop between planning and execution.

🧑‍💻

Human in the Loop

You control how much autonomy agents have. Inject human approval at any step, or let them run fully automated.

The Big Picture

Human / Task

→

UserProxy Agent

↔

AssistantAgent

→

Code Executor

→

Result

💡

AutoGen v0.4 (AgentChat) is the current stable API. It introduced a cleaner async-first design with AssistantAgent, UserProxyAgent, and GroupChat. This course uses v0.4 patterns.

AutoGen vs. The World

Framework	Paradigm	Best For
AutoGen	Autonomous multi-agent conversation	Complex reasoning, code generation, research tasks
LangGraph	Stateful graph-based workflows	Fine-grained control over agent state & branching
CrewAI	Role-based agent teams	Business automation with defined roles
n8n	Deterministic workflow automation	Integrating SaaS tools with predictable logic

🧠 Quick Check: What fundamentally distinguishes AutoGen from n8n?

A AutoGen is newer and faster

B AutoGen agents reason autonomously; n8n executes deterministic steps

C n8n can't use AI; AutoGen can

D AutoGen only works with OpenAI models

Module 02 · Foundations

Core Concepts

The building blocks: agents, conversations, termination, and the LLM config pattern.

The Two Primary Agents

🤖

AssistantAgent

Powered by an LLM. Receives messages, reasons, and replies. It can suggest code, call functions, and generate plans. Doesn't execute code by default.

👤

UserProxyAgent

Represents a human or an executor. Can run code that the AssistantAgent produces, then feed results back. May prompt a real human for approval.

LLM Configuration

llm_config.py

import autogen
llm_config = {
    "config_list": [{ "model": "gpt-4o", "api_key": "sk-..." }],
    "temperature": 0.1,
    "cache_seed": 42,
}
config_list = autogen.config_list_from_json("OAI_CONFIG_LIST")

Conversations & Termination

termination.py

# 1. Max turns
user_proxy.initiate_chat(assistant, max_turns=5)
# 2. Keyword
assistant = autogen.AssistantAgent(system_message="...reply TERMINATE when done")
# 3. Custom function
def my_term(msg): return "task_complete" in (msg["content"] or "").lower()
user_proxy = autogen.UserProxyAgent(is_termination_msg=my_term)

Human Input Modes

Mode	Behavior	Use Case
ALWAYS	Asks human at every step	Interactive sessions, demos
TERMINATE	Asks human only on termination	Approval gate at the end
NEVER	Fully autonomous	Production pipelines

⚠️

Code Execution Safety: Always use Docker sandbox or restricted paths in production. Never run untrusted agent code on bare metal.

🧠 Which agent actually runs Python code that the other agent writes?

A AssistantAgent

B UserProxyAgent

C Both equally

D Neither

Module 03 · Foundations

Your First Agents

Build a working two-agent system from 15 lines of Python.

Installation

terminal

pip install pyautogen
pip install pyautogen[docker]  # for Docker sandbox
export OPENAI_API_KEY="sk-..."

Hello World: Two-Agent System

hello_autogen.py

import autogen
llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "sk-..."}]}

assistant = autogen.AssistantAgent(
    name="assistant", llm_config=llm_config,
    system_message="You are a Python expert. When done, reply: TERMINATE"
)
user_proxy = autogen.UserProxyAgent(
    name="user_proxy", human_input_mode="NEVER",
    is_termination_msg=lambda x: "TERMINATE" in (x.get("content") or ""),
    code_execution_config={"work_dir": "coding", "use_docker": False}
)
user_proxy.initiate_chat(assistant, message="Print the first 10 Fibonacci numbers.")

✅

No code block? If the reply contains no code, user_proxy sends "There is no code from the last message, provide the code." — automatically nudging the assistant.

🧠 What triggers the conversation to stop in the example above?

A A max_turns limit of 10

B Code executing successfully

C The assistant including "TERMINATE" in its reply

D AutoGen auto-detects task completion

Module 04 · Patterns

Conversation Patterns

Two-agent, sequential chaining, nested chats, Swarm — and when to use each.

Pattern 1: Two-Agent (Default)

One user_proxy, one assistant. Best for focused single tasks: code generation, Q&A, analysis.

Pattern 2: Sequential Chaining

sequential.py

r1 = user_proxy.initiate_chat(writer, message="Write a blog post about RAG")
r2 = user_proxy.initiate_chat(critic, message=f"Review:\n{r1.summary}")
r3 = user_proxy.initiate_chat(editor, message=f"Apply feedback:\n{r2.summary}")

Pattern 3: Nested Chats

nested.py

assistant.register_nested_chats(
    trigger=user_proxy,
    chat_queue=[{ "recipient": specialist, "summary_method": "last_msg", "max_turns": 3 }]
)

Pattern 4: Swarm (v0.4)

swarm.py

from autogen import SwarmAgent, initiate_swarm_chat
triage = SwarmAgent(name="triage", handoffs=["billing", "tech", "sales"])
initiate_swarm_chat(initial_agent=triage, agents=[triage, billing, tech, sales],
    messages="Can't access account after payment failed")

🗺️

Single task → Two-agent. Pipeline → Chaining. Sub-tasks → Nested. Dynamic routing → Swarm. Team collaboration → Group Chat.

Module 05 · Patterns

Tool Use & Function Calling

Give agents real-world capabilities: web search, database queries, API calls.

Defining Tools with Decorators

tools.py

@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Get current weather for a city")
def get_weather(city: str) -> str:
    return f"Weather in {city}: 22°C, sunny"

@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Search the web for current info")
def web_search(query: str) -> str:
    return f"Results for '{query}': ..."

How Tool Calling Works

Task

→

LLM decides to call

→

proxy executes fn

→

result injected

→

LLM uses result

🔧

Type hints matter — AutoGen uses them to generate JSON schema. Descriptions are prompts — the LLM reads them to decide when to call. Return strings — tools should return str or JSON-serializable values.

🧠 Why does the assistant need register_for_llm and user_proxy needs register_for_execution?

A Just boilerplate — identical under the hood

B LLM needs to know about the tool; executor actually runs it

C Both agents run it independently

D Only needed for GPT-4

Module 06 · Patterns

Group Chat

Orchestrate 3+ specialized agents collaborating on a shared task.

group_chat.py

planner = autogen.AssistantAgent(name="Planner", llm_config=llm_config,
    system_message="Break tasks into subtasks and assign them.")
coder = autogen.AssistantAgent(name="Coder", llm_config=llm_config,
    system_message="Write high-quality Python. No prose, just code.")
critic = autogen.AssistantAgent(name="Critic", llm_config=llm_config,
    system_message="Review code for bugs, edge cases, style.")

groupchat = autogen.GroupChat(
    agents=[user_proxy, planner, coder, critic],
    messages=[], max_round=12, speaker_selection_method="auto"
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
user_proxy.initiate_chat(manager, message="Build a FastAPI sentiment endpoint")

Speaker Selection

Method	How	Best For
`auto`	LLM picks most relevant agent	General purpose
`round_robin`	Agents take turns in order	Structured loops
`random`	Random each turn	Diverse perspectives
custom fn	Your function decides	Complex routing

⚠️

Token costs: Each agent sees the full history. Use max_round limits and concise system messages.

Module 07 · Production

Memory & RAG

Give agents long-term memory with vector stores and retrieval-augmented generation.

Built-in RAG: RetrieveUserProxyAgent

rag_agent.py

from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
rag = RetrieveUserProxyAgent(name="rag", retrieve_config={
    "task": "qa", "docs_path": ["./docs/"],
    "model": "gpt-4o", "vector_db": "chroma",
    "collection_name": "my_docs", "get_or_create": True,
}, code_execution_config=False, human_input_mode="NEVER")
rag.initiate_chat(assistant, problem="What does our API return on auth failure?")

Memory Architecture Patterns

🔍

Vector DB (Semantic)

Embeddings of past conversations, docs, or facts. Retrieve by meaning. Best for knowledge bases, long-term recall.

🗄️

SQL DB (Structured)

Store structured facts: preferences, task history, entities. Best for user profiles, audit trails.

🕸️

Graph DB (Relational)

Model relationships between entities. Best for knowledge graphs, multi-hop reasoning.

⚡

In-Context (Short-term)

AutoGen manages conversation history automatically. Limited by token budget — summarize old turns.

🏗️

Production pattern: Vector DB for semantic recall + SQL for structured facts + in-context summarization for recent turns. Tri-layer architecture handles the full spectrum.

Module 08 · Production

AutoGen vs. Other Frameworks

When to use AutoGen, when to use something else, and how to combine them.

Scenario	Best Pick	Why
AI writes & debugs code autonomously	AutoGen	Code execution loop + multi-agent review
Research: gather, analyze, synthesize	AutoGen	Autonomous reasoning + tool use
Strict step-by-step workflow	LangGraph / n8n	Deterministic control flow
Role-based teams (PM, dev, QA)	CrewAI	First-class role/goal/task primitives
SaaS integration automation	n8n	500+ no-code connectors
Complex RAG + reasoning	LlamaIndex + AutoGen	Best of both

✅

Strengths

• Autonomous code writing + execution
• Flexible conversation patterns
• Human-in-the-loop at any granularity
• Active research + rapid updates

⚠️

Weaknesses

• Less deterministic than graph tools
• Token costs escalate in group chats
• Multi-agent debugging is hard
• v0.4 API still maturing

🧠 Final: Route customer emails to billing/tech/sales using AI. Best combo?

A Pure AutoGen group chat

B Pure n8n with keyword filtering

C n8n triggers → AutoGen Swarm routes → n8n sends

D CrewAI with three agents

🎓

Course Complete — Now Build Something Real

You've covered all 8 modules. Time to apply everything in a full project — the Pizza Order Bot.

Section 01 · Design

Architecture

Why multi-agent for a pizza bot? What each agent does. How this differs from Dialogflow CX or Amazon Lex.

You've already built this bot in Dialogflow CX (state machine + NLU) and Amazon Lex (slots + Lambda). The AutoGen version works completely differently — instead of a pre-wired state machine, you have agents that reason their way through the conversation.

No pages, no routes, no explicit state transitions. The agents decide what to ask, when to validate, and when to submit.

Dialogflow CX vs AutoGen — Same Bot, Different Soul

Dimension	Dialogflow CX	AutoGen
Flow control	State machine (Pages → Routes)	Agents reason and decide
Input handling	Slot filling + entity types	LLM extracts intent + entities
Validation	Regex on slot values	Validator agent checks order object
Flexibility	Rigid — changes need flow redesign	High — change system prompt
Debugging	Visual flow trace in console	Agent message log, print statements

System Architecture

👤 Customer

natural language

⟺

🤖 OrderAgent

AssistantAgent (LLM)

⟺

🛡️ ValidatorAgent

AssistantAgent (LLM)

↓

🔧 UserProxyAgent

executes tools

get_menu()

calc_price()

validate_order()

submit_order()

Conversation Flow

Customer:"Large margherita, extra cheese, no olives"

OrderAgent:[calls get_menu()] → confirms items exist

OrderAgent:[calls calc_price(size="large", pizza="margherita", extras=["extra_cheese"])]

OrderAgent:[calls validate_order()] → ValidatorAgent returns "VALID"

OrderAgent→Customer:"Large Margherita, extra cheese, no olives — $20.00. Confirm?"

Customer:"Yes please"

UserProxy:[calls submit_order()] → ORD-48291

OrderAgent:"Order #ORD-48291 placed! Ready at 14:35. TERMINATE"

🍕

Key insight: The OrderAgent never follows a fixed script. It reasons about what's missing, asks naturally, and uses tools when ready. "I want a large, actually XL, pepperoni, and can I add mushrooms?" — handled naturally.

Project 1 / 9

Section 02 · Design

The Agents

Three agents with distinct roles. The soul of the system lives in the prompts.

🤖

OrderAgent — AssistantAgent

Customer-facing. Conducts the conversation, collects pizza details, calls tools for prices, passes a structured order to ValidatorAgent, then submits on confirmation.

🛡️

ValidatorAgent — AssistantAgent

Called as a tool. Receives order fields, checks pizza type, size, crust, toppings, and price plausibility. Returns VALID or INVALID: [issues].

🔧

UserProxy — UserProxyAgent

Executes all tools. human_input_mode=NEVER for automation, or uses smart_human_reply for interactive terminal sessions.

OrderAgent System Prompt (key excerpt)

ORDER_AGENT_PROMPT

Your job flow:
1. Greet the customer warmly
2. Call get_menu() once (silently) to know what's available
3. Collect: pizza type, size, crust (default: thin), extras, removals
4. Call calc_price() once you have all details
5. Call validate_order() — read VALID or INVALID response
6. If VALID → present summary + ask for confirmation
7. If INVALID → fix issues, recalculate, re-validate
8. On confirmation → call submit_order()
9. Reply with order ID + ETA, then: TERMINATE

💡

Why a separate ValidatorAgent? The OrderAgent focuses on conversation and can make small reasoning errors assembling the final order. The Validator is a second LLM pass with a narrow, strict task — catching issues before they reach the kitchen. Classic critic-agent pattern.

Project 2 / 9

Section 03 · Build

Project Setup

Install dependencies, create the project structure, configure your API key.

File Structure

pizza_bot/
  ├── main.py      # entry point — two modes: automated + interactive
  ├── agents.py    # agent definitions, prompts, tool registration
  ├── tools.py     # get_menu, calc_price, submit_order
  ├── menu.py      # pizza data — prices, toppings, crusts
  ├── config.py    # LLM_CONFIG from .env
  ├── .env         # OPENAI_API_KEY=sk-...
  └── requirements.txt

Installation

terminal

python -m venv .venv
source .venv/bin/activate
pip install pyautogen python-dotenv

config.py

import os
from dotenv import load_dotenv
load_dotenv()
LLM_CONFIG = {
    "config_list": [{"model": "gpt-4o", "api_key": os.getenv("OPENAI_API_KEY")}],
    "temperature": 0.2, "cache_seed": None,
}

Project 3 / 9

Section 04 · Build

Menu & Tools

The pizza data and the four tool functions agents use to interact with it.

menu.py

PIZZAS = {
    "margherita":  {"base_price": {"small":10,"medium":14,"large":18,"xl":22},
                    "default_toppings": ["mozzarella","tomato_sauce","basil"]},
    "pepperoni":   {"base_price": {"small":12,"medium":16,"large":20,"xl":24},
                    "default_toppings": ["mozzarella","tomato_sauce","pepperoni"]},
    "bbq_chicken": { ... },
    "veggie":      { ... },
}
EXTRA_TOPPINGS = {"extra_cheese":2.0, "mushroom":1.5, "bacon":2.5, ...}
CRUSTS = ["thin", "thick", "stuffed", "gluten_free"]
CRUST_UPCHARGE = {"stuffed": 3.0, "gluten_free": 2.0}

Key Tool Functions

tools.py

def get_menu() -> str:
    """Return full menu as JSON string."""
    return json.dumps({"pizzas": ..., "extra_toppings": ..., "crusts": ...})

def calc_price(pizza_type: str, size: str, crust: str="thin", extra_toppings: list=None) -> str:
    """Calculate total price. Returns JSON with total + breakdown."""
    total = PIZZAS[pizza_type]["base_price"][size] + CRUST_UPCHARGE.get(crust, 0)
    for t in (extra_toppings or []): total += EXTRA_TOPPINGS[t]
    return json.dumps({"total": round(total, 2), "breakdown": {...}})

def submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name="Guest") -> str:
    """Submit order to kitchen. Returns order_id + ETA."""
    order_id = f"ORD-{random.randint(10000,99999)}"
    # In production: write to DB, call kitchen API, send SMS...
    return json.dumps({"order_id": order_id, "eta": eta, "message": "Confirmed!"})

Project 4 / 9

Section 05 · Build

Order Agent

Creating the OrderAgent and registering tools with the two-decorator pattern.

agents.py — register_tools()

def register_tools(order_agent, validator_agent, user_proxy):

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Fetch the full pizza menu")
    def _get_menu() -> str: return get_menu()

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Calculate total price")
    def _calc_price(pizza_type: str, size: str, crust: str="thin", extra_toppings: list=None) -> str:
        return calc_price(pizza_type, size, crust, extra_toppings)

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Validate order before confirming with customer")
    def _validate_order(pizza_type: str, size: str, crust: str,
        extra_toppings: list, remove_toppings: list, price: float) -> str:
        # Spins up a one-shot ValidatorAgent chat, returns "VALID" or "INVALID: ..."
        order_str = json.dumps({"pizza_type": pizza_type, "size": size, ...})
        vproxy = autogen.UserProxyAgent(name="vp", human_input_mode="NEVER",
            is_termination_msg=lambda x: True, code_execution_config=False)
        vproxy.initiate_chat(validator_agent,
            message=f"Validate:\n{order_str}", max_turns=1, silent=True)
        history = validator_agent.chat_messages.get(vproxy, [])
        for m in reversed(history):
            if m["role"] == "assistant": return m["content"]

    @user_proxy.register_for_execution()
    @order_agent.register_for_llm(description="Submit confirmed order. Call ONLY after customer says yes.")
    def _submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name="Guest") -> str:
        return submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name)

🔍

Why wrap the functions? The @ decorators must be applied at definition time in the function scope so AutoGen can capture correct references and bind them to the right agents.

Project 5 / 9

Section 06 · Build

Validator Agent

Called as a tool by the OrderAgent — a one-shot LLM check on the assembled order.

VALIDATOR_PROMPT

You are a strict pizza order validator for PizzaLab.

Check ALL of the following:
1. pizza_type is one of: margherita, pepperoni, bbq_chicken, veggie
2. size is one of: small, medium, large, xl
3. crust is one of: thin, thick, stuffed, gluten_free
4. All extra_toppings are on the extras menu
5. Price is plausible for size + extras

Respond with EXACTLY one of:
- "VALID"
- "INVALID: [issue 1]; [issue 2]"

No extra text. Be terse.

⚠️

Learned lesson — don't use nested chats here. We tried register_nested_chats — it fired on every turn and couldn't reliably find the order JSON. The fix: validate_order() is a regular tool that spins up a one-shot initiate_chat internally. Explicit, deterministic, visible in the tool log.

Project 6 / 9

Section 07 · Build

Wiring It Together

main.py assembles all agents and handles both automated testing and real interactive sessions cleanly.

main.py — interactive mode (the tricky part)

# KEY LESSON: human_input_mode="ALWAYS" prompts on EVERY turn
# including tool-call turns. Fix: use NEVER + a smart reply function.

def smart_human_reply(recipient, messages, sender, config):
    last = messages[-1] if messages else {}
    role = last.get("role", "")

    if role == "tool":          # tool result turn — pass through
        return False, None
    if last.get("tool_calls"):  # tool call turn — pass through
        return False, None       # ← CRITICAL: prevents 400 BadRequestError

    # Genuine conversational turn — ask the human
    human_input = input("You: ").strip()
    if human_input.lower() == "exit": return True, "exit"
    return True, human_input

user_proxy = autogen.UserProxyAgent(name="customer", human_input_mode="NEVER", ...)
user_proxy.register_reply(trigger=[autogen.AssistantAgent, None],
    reply_func=smart_human_reply, position=0)

Three Bugs We Fixed Building This

Bug	Root Cause	Fix
Infinite validation loop	register_nested_chats fires on every turn	validate_order() as a regular tool
TypeError: NoneType not iterable	content=None on tool-call messages	`(x.get("content") or "")`
400 BadRequestError	Empty reply inserted between tool_call and tool_result	Guard on `last.get("tool_calls")`

Project 7 / 9

Section 08 · Production

Running & Testing

Two ways to run the bot. What to expect. What to test.

Run Commands

terminal

# Automated test (scripted replies, no typing needed)
python main.py

# Interactive mode (you type each customer reply)
INTERACTIVE=1 python main.py

Interactive Session Preview

OrderAgent:Welcome to PizzaLab! What can I get for you today?

You:I'd like a margherita pizza

OrderAgent:Great choice! What size — small, medium, large, or XL?

You:Large please

OrderAgent:Thin, thick, stuffed, or gluten-free crust?

You:Stuffed crust, and add extra cheese

OrderAgent:Large margherita, stuffed crust, extra cheese — $23.00. Shall I place this order?

You:Yes!

UserProxy:[submit_order() → ORD-58291]

OrderAgent:Order #ORD-58291 confirmed! Ready at 14:35. TERMINATE

Edge Cases to Test

Input	Expected
"I want a Hawaiian pizza"	Agent says not on menu, suggests alternatives
"XL margherita stuffed crust + pineapple + bacon"	Prices stuffed upcharge ($3) + 2 extras ($4) correctly
"Actually change it to medium"	Recalculates from scratch
"No" to confirmation	Agent asks what to change, doesn't submit

Project 8 / 9

Section 09 · Production

Extensions

Where to take this next — production-grade enhancements.

🎤

Real-Time Voice

Azure ACS Call Automation + Azure OpenAI Realtime API. OrderAgent becomes a phone voice agent. Same logic, different I/O layer.

🧠

Order History Memory

Add get_past_orders(customer_id) tool backed by ChromaDB. Agent can offer "Same as last time?"

👥

Group Order

GroupChat with a CustomerAgent per person + one OrderCoordinatorAgent. Handles "ordering for 4 people" naturally.

☁️

Azure Deployment

Wrap main() in an Azure Function (HTTP trigger). Deploy to Container Apps. Swap OpenAI for Azure OpenAI endpoint.

📊

Observability

Add LangFuse or OpenTelemetry. Trace every agent turn, tool call, and token count. Essential for catching LLM drift in production.

🔌

MCP Tools

Replace manual tool registration with an MCP server. One server, plug into any AutoGen, LangGraph, or Claude agent.

🎓🍕

Course + Project Complete

You've covered all 8 AutoGen modules and built a working multi-agent pizza ordering system — with tool use, validation, interactive mode, and three real bugs debugged. The same pattern scales to any transactional chatbot domain.

Project 9 / 9